3D服装重建的现有方法要么假设服装几何形状的预定义模板(将其限制为固定服装样式),要么产生顶点有色网眼(缺少高频纹理细节)。我们的新型框架共同学习的几何和语义信息来自输入单眼图像,用于无模板纹理的3D服装数字化。更具体地说,我们建议扩展去皮的表示,以预测像素对齐的分层深度和语义图以提取3D服装。进一步利用分层表示,以参数化提取服装的任意表面,而没有任何人类干预以形成紫外线图集。然后,通过将像素从输入图像从输入图像投射到可见区域的UV空间,然后以混合方式将纹理以混合方式赋予,然后添加封闭的区域。因此,我们能够将任意放松的衣服样式数字化,同时从单眼图像中保留高频纹理细节。我们在三个公开可用的数据集中获得了高保真3D服装重建结果,并在Internet图像上概括。
translated by 谷歌翻译
深度神经网络的过度参数性质导致在低端设备上的部署过程中有很大的障碍,并具有时间和空间限制。使用迭代修剪培训方案稀疏DNN的网络修剪策略通常在计算上很昂贵。结果,在训练之前,在初始化时修剪修剪的技术变得越来越流行。在这项工作中,我们提出了神经元到神经元的跳过连接,这些连接是稀疏的加权跳过连接,以增强修剪的DNN的整体连通性。遵循初步修剪步骤,在修剪网络的单个神经元/通道之间随机添加N2NSKIP连接,同时保持网络的整体稀疏性。我们证明,与没有N2NSKIP连接的修剪的网络相比,在修剪网络中引入N2NSKIP连接可以显着卓越的性能,尤其是在高稀疏度水平上。此外,我们提出了基于热扩散的连接分析,以定量确定修剪网络相对于参考网络的连通性。我们评估方法对两种不同初步修剪方法的疗效,这些方法在初始化时修剪,并通过利用N2NSKIP连接引起的增强连接性来始终获得卓越的性能。
translated by 谷歌翻译
自动化生成和(用户)逼真的虚拟地形的创作是VR模型和游戏等多媒体应用最受寻求的。地形采用的最常见的代表是数字海拔模型(DEM)。现有地形创作和建模技术已经解决了其中一些并且可以广泛地分类为:程序建模,仿真方法和基于示例的方法。在本文中,我们提出了一种由VAE和生成条件GaN模型组合的新型现实地形创作框架。我们的框架是一种基于示例的方法,该方法通过从真实世界地形数据集学习潜在的空间来克服现有方法的局限性。此潜在空间允许我们从单个输入生成地形的多个变体,以及地形之间的内插,同时保持所生成的地形接近真实数据分布。我们还开发了一个交互式工具,让用户使用最低纲领派的输入生成不同的地形。我们进行彻底的定性和定量分析,并提供与其他SOTA方法的比较。我们打算向学术界发出我们的代码/工具。
translated by 谷歌翻译
传统上,视频会议是广泛采用的电信解决方案,但由于面部代表性的2D性质,缺乏沉浸性是固有的。通过头戴式显示器(HMDS)的通信/远程呈现系统中虚拟现实(VR)的集成有望为用户提供更好的沉浸体验。然而,HMD通过阻挡用户的面部外观和表达而导致障碍。为了克服这些问题,我们提出了一种用于HMD去闭锁的一种新的关注的编码器解码器架构。我们还建议使用用户的短视频(1-2分钟),在不同的外观中捕获的短视频(1-2分钟)培训我们的特定于人士的模型,并展示了解开了Unseen姿势和外观的概括。我们通过最先进的方法报告了卓越的定性和定量结果。我们还使用现有动画和3D面重建管道向混合视频电话会议提供这种方法的应用。
translated by 谷歌翻译
在本文中,我们开发了一种强大的3D服装数字化解决方案,可以在现实世界时尚目录图像上概括用布纹理遮挡和大体姿势变化。我们假设已知类型的服装类型的固定拓扑参数模板网格模型(例如,T恤,裤子),并从输入目录图像执行高质量纹理的映射到与衣服的参数网格模型相对应的UV映射面板。我们通过首先预测服装边界的稀疏2D地标。随后,我们使用这些地标在UV地图面板上执行基于薄板样条的纹理传输。随后,我们使用深度纹理修复网络来填充TPS输出中的大孔(由于查看变化和自闭电),以产生一致的UV映射。此外,为了培训监督的地标预测和纹理修复任务,我们产生了一大组合成数据,其具有不同于各种姿势的各种视图的不同纹理和照明。此外,我们手动注释了一小组时尚目录图像从在线时尚电子商务平台到Finetune。我们开展彻底的经验评估,并在时尚目录图像上显示我们所提出的3D服装纹理解决方案的令人印象深刻的定性结果。这种3D服装数字化有助于我们解决启用3D虚拟试验的具有挑战性的任务。
translated by 谷歌翻译
3D单眼图像的人体重建是在多个域中具有更广泛应用的计算机视觉中有趣和不良的问题。在本文中,我们提出了一种新颖的端到端培训网络,可从单眼图像中准确地恢复3D人的详细几何和外观。在衣服模型的非参数去皮深度图表示之前,我们提出了稀疏和有效的参数体融合。参数正文以两种方式进行了限制我们的模型:首先,网络保留不受衣服封闭的几何一致身体部位,而第二件,它提供了改善剥离深度图的预测的身体形状上下文。这使得能够在给定输入图像的情况下,在2D地图上的L1损耗仅恢复细粒度的3D几何细节。我们在公开可用的布料3D和Thuman数据集中评估夏普,并向最先进的方法报告卓越的性能。
translated by 谷歌翻译
Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.
translated by 谷歌翻译
Large training data and expensive model tweaking are standard features of deep learning for images. As a result, data owners often utilize cloud resources to develop large-scale complex models, which raises privacy concerns. Existing solutions are either too expensive to be practical or do not sufficiently protect the confidentiality of data and models. In this paper, we study and compare novel \emph{image disguising} mechanisms, DisguisedNets and InstaHide, aiming to achieve a better trade-off among the level of protection for outsourced DNN model training, the expenses, and the utility of data. DisguisedNets are novel combinations of image blocktization, block-level random permutation, and two block-level secure transformations: random multidimensional projection (RMT) and AES pixel-level encryption (AES). InstaHide is an image mixup and random pixel flipping technique \cite{huang20}. We have analyzed and evaluated them under a multi-level threat model. RMT provides a better security guarantee than InstaHide, under the Level-1 adversarial knowledge with well-preserved model quality. In contrast, AES provides a security guarantee under the Level-2 adversarial knowledge, but it may affect model quality more. The unique features of image disguising also help us to protect models from model-targeted attacks. We have done an extensive experimental evaluation to understand how these methods work in different settings for different datasets.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译